CHAPTER 16 Getting Straight Talk on Straight-Line Regression 229
With the output shown in Figure 16-4, where the intercept (a) is 76.9 and the
slope (b) is 0.487, you can write the equation of the fitted straight line like this:
SBP = 76.9 + 0.487 Weight.
Then you can use this equation to predict someone’s SBP if you know their weight.
So, if a person weighs 100 kilograms, you can estimate that that person’s SBP will
be around 76 9
100
0 487
.
.
, which is 76 9
48 7
.
.
, or about 125.6 mmHg. Your
prediction probably won’t be exactly on the nose, but it should be better than not
using a predictive model and just guessing.
How far off will your prediction be? The residual SE provides a unit of measure-
ment to answer this question. As we explain in the earlier section “Summary sta-
tistics for the residuals,” the residual SE indicates how much the individual points
tend to scatter above and below the fitted line. For the SBP example, this number
is 9 8. , so you can expect your prediction to be within about 10 mmHg most of
the time.
Recognizing What Can Go Wrong with
Straight-Line Regression
Fitting a straight line to a set of data is a relatively simple task, but you still have
to be careful. A computer program does whatever you tell it to, even if it’s some-
thing you shouldn’t do.
Those new to straight-line regression may slip up in the following ways:»
» Fitting a straight line to curved data: Examining the pattern of residuals in
the residuals versus fitted chart in Figure 16-5 can let you know if you have
this problem.»
» Ignoring outliers in the data: Outliers — especially those in the corners of a
scatterplot like the one in Figure 16-3 — can mess up all the classical statistical
analyses, and regression is no exception. One or two data points that are way
off the main trend of the points will drag the fitted line away from the other
points. That’s because the strength with which each point tugs at the fitted
line is proportionate to the square of its distance from the line, and outliers
have a lot of distance, so they have a strong influence.
Always look at a scatter plot of your data to make sure outliers aren’t present.
Examine the residuals to ensure they are distributed normally above and
below the fitted line.